Overview
Brought to you by YData
Dataset statistics
| Number of variables | 19 |
|---|---|
| Number of observations | 9798 |
| Missing cells | 11439 |
| Missing cells (%) | 6.1% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 1.4 MiB |
| Average record size in memory | 152.0 B |
Variable types
| Text | 9 |
|---|---|
| Numeric | 6 |
| DateTime | 1 |
| Categorical | 3 |
final_budget is highly overall correlated with final_domestic_boxoffice and 1 other fields | High correlation |
final_domestic_boxoffice is highly overall correlated with final_budget and 1 other fields | High correlation |
final_worldwide_boxoffice is highly overall correlated with final_budget and 1 other fields | High correlation |
certificate is highly imbalanced (50.0%) | Imbalance |
enrichment_source is highly imbalanced (83.2%) | Imbalance |
imdb_id has 845 (8.6%) missing values | Missing |
production_companies has 1408 (14.4%) missing values | Missing |
release_date has 149 (1.5%) missing values | Missing |
final_year has 149 (1.5%) missing values | Missing |
director has 827 (8.4%) missing values | Missing |
star has 849 (8.7%) missing values | Missing |
certificate has 1587 (16.2%) missing values | Missing |
rating has 983 (10.0%) missing values | Missing |
runtime has 903 (9.2%) missing values | Missing |
genres has 907 (9.3%) missing values | Missing |
production_countries has 1001 (10.2%) missing values | Missing |
original_language has 965 (9.8%) missing values | Missing |
enrichment_source has 866 (8.8%) missing values | Missing |
final_worldwide_boxoffice has 428 (4.4%) zeros | Zeros |
final_domestic_boxoffice has 752 (7.7%) zeros | Zeros |
Reproduction
| Analysis started | 2025-03-31 19:57:07.140236 |
|---|---|
| Analysis finished | 2025-03-31 19:57:11.136089 |
| Duration | 4 seconds |
| Software version | ydata-profiling vv4.15.1 |
| Download configuration | config.json |
Variables
final_title
Text
| Distinct | 7871 |
|---|---|
| Distinct (%) | 80.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 76.7 KiB |
Length
| Max length | 104 |
|---|---|
| Median length | 50 |
| Mean length | 15.194427 |
| Min length | 1 |
Unique
| Unique | 6054 ? |
|---|---|
| Unique (%) | 61.8% |
Sample
| 1st row | #Horror |
|---|---|
| 2nd row | (500) Days of Summer |
| 3rd row | 10,000 B.C. |
| 4th row | 10,000 BC |
| 5th row | 101 Dalmatians |
| Value | Count | Frequency (%) |
| the | 2934 | 10.9% |
| of | 857 | 3.2% |
| a | 334 | 1.2% |
| and | 269 | 1.0% |
| in | 240 | 0.9% |
| 2 | 214 | 0.8% |
| to | 187 | 0.7% |
| 150 | 0.6% | |
| man | 126 | 0.5% |
| movie | 92 | 0.3% |
| Other values (7165) | 21467 |
Most occurring characters
| Value | Count | Frequency (%) |
| 17073 | 11.5% | |
| e | 15136 | 10.2% |
| a | 9478 | 6.4% |
| o | 8849 | 5.9% |
| n | 8111 | 5.4% |
| r | 7992 | 5.4% |
| i | 7675 | 5.2% |
| t | 7307 | 4.9% |
| s | 5850 | 3.9% |
| h | 5578 | 3.7% |
| Other values (482) | 55826 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 148875 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 17073 | 11.5% | |
| e | 15136 | 10.2% |
| a | 9478 | 6.4% |
| o | 8849 | 5.9% |
| n | 8111 | 5.4% |
| r | 7992 | 5.4% |
| i | 7675 | 5.2% |
| t | 7307 | 4.9% |
| s | 5850 | 3.9% |
| h | 5578 | 3.7% |
| Other values (482) | 55826 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 148875 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 17073 | 11.5% | |
| e | 15136 | 10.2% |
| a | 9478 | 6.4% |
| o | 8849 | 5.9% |
| n | 8111 | 5.4% |
| r | 7992 | 5.4% |
| i | 7675 | 5.2% |
| t | 7307 | 4.9% |
| s | 5850 | 3.9% |
| h | 5578 | 3.7% |
| Other values (482) | 55826 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 148875 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 17073 | 11.5% | |
| e | 15136 | 10.2% |
| a | 9478 | 6.4% |
| o | 8849 | 5.9% |
| n | 8111 | 5.4% |
| r | 7992 | 5.4% |
| i | 7675 | 5.2% |
| t | 7307 | 4.9% |
| s | 5850 | 3.9% |
| h | 5578 | 3.7% |
| Other values (482) | 55826 |
final_budget
Real number (ℝ)
High correlation 
| Distinct | 755 |
|---|---|
| Distinct (%) | 7.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 33275943 |
| Minimum | 1 |
|---|---|
| Maximum | 5.332 × 108 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 76.7 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 500000 |
| Q1 | 5000000 |
| median | 17500000 |
| Q3 | 40000000 |
| 95-th percentile | 1.3 × 108 |
| Maximum | 5.332 × 108 |
| Range | 5.332 × 108 |
| Interquartile range (IQR) | 35000000 |
Descriptive statistics
| Standard deviation | 44332470 |
|---|---|
| Coefficient of variation (CV) | 1.3322679 |
| Kurtosis | 9.9851176 |
| Mean | 33275943 |
| Median Absolute Deviation (MAD) | 14225000 |
| Skewness | 2.6741561 |
| Sum | 3.2603769 × 1011 |
| Variance | 1.9653679 × 1015 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 20000000 | 357 | 3.6% |
| 10000000 | 343 | 3.5% |
| 30000000 | 308 | 3.1% |
| 25000000 | 293 | 3.0% |
| 15000000 | 292 | 3.0% |
| 40000000 | 281 | 2.9% |
| 5000000 | 261 | 2.7% |
| 50000000 | 213 | 2.2% |
| 35000000 | 212 | 2.2% |
| 12000000 | 200 | 2.0% |
| Other values (745) | 7038 |
| Value | Count | Frequency (%) |
| 1 | 4 | |
| 2 | 1 | < 0.1% |
| 3 | 1 | < 0.1% |
| 5 | 2 | |
| 6 | 2 | |
| 8 | 2 | |
| 10 | 1 | < 0.1% |
| 11 | 1 | < 0.1% |
| 12 | 1 | < 0.1% |
| 15 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 533200000 | 1 | < 0.1% |
| 400000000 | 3 | |
| 380000000 | 1 | < 0.1% |
| 379000000 | 1 | < 0.1% |
| 365000000 | 1 | < 0.1% |
| 340000000 | 1 | < 0.1% |
| 330400000 | 1 | < 0.1% |
| 300000000 | 5 | |
| 290000000 | 1 | < 0.1% |
| 280200000 | 1 | < 0.1% |
final_worldwide_boxoffice
Real number (ℝ)
High correlation  Zeros 
| Distinct | 8943 |
|---|---|
| Distinct (%) | 91.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 96943398 |
| Minimum | 0 |
|---|---|
| Maximum | 2.923706 × 109 |
| Zeros | 428 |
| Zeros (%) | 4.4% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 76.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1432.9 |
| Q1 | 5401410.8 |
| median | 30065016 |
| Q3 | 1.0232693 × 108 |
| 95-th percentile | 4.1592791 × 108 |
| Maximum | 2.923706 × 109 |
| Range | 2.923706 × 109 |
| Interquartile range (IQR) | 96925519 |
Descriptive statistics
| Standard deviation | 1.8599809 × 108 |
|---|---|
| Coefficient of variation (CV) | 1.9186257 |
| Kurtosis | 35.144501 |
| Mean | 96943398 |
| Median Absolute Deviation (MAD) | 29279883 |
| Skewness | 4.6845087 |
| Sum | 9.4985141 × 1011 |
| Variance | 3.459529 × 1016 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 428 | 4.4% |
| 2000000 | 16 | 0.2% |
| 11000000 | 15 | 0.2% |
| 8000000 | 12 | 0.1% |
| 10000000 | 11 | 0.1% |
| 7000000 | 10 | 0.1% |
| 9000000 | 9 | 0.1% |
| 6000000 | 9 | 0.1% |
| 4000000 | 8 | 0.1% |
| 2500000 | 7 | 0.1% |
| Other values (8933) | 9273 |
| Value | Count | Frequency (%) |
| 0 | 428 | |
| 1 | 3 | < 0.1% |
| 4 | 3 | < 0.1% |
| 5 | 1 | < 0.1% |
| 6 | 2 | < 0.1% |
| 11 | 2 | < 0.1% |
| 13 | 1 | < 0.1% |
| 14 | 1 | < 0.1% |
| 16 | 1 | < 0.1% |
| 17 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 2923706026 | 1 | |
| 2748242781 | 1 | |
| 2743577587 | 1 | |
| 2320250281 | 1 | |
| 2223048786 | 1 | |
| 2068223624 | 1 | |
| 2056046835 | 1 | |
| 2048359754 | 1 | |
| 1979091486 | 1 | |
| 1921206586 | 1 |
final_domestic_boxoffice
Real number (ℝ)
High correlation  Zeros 
| Distinct | 7744 |
|---|---|
| Distinct (%) | 79.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 43315548 |
| Minimum | 0 |
|---|---|
| Maximum | 9.3666222 × 108 |
| Zeros | 752 |
| Zeros (%) | 7.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 76.7 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 2005226.8 |
| median | 17529688 |
| Q3 | 53267000 |
| 95-th percentile | 1.7430967 × 108 |
| Maximum | 9.3666222 × 108 |
| Range | 9.3666222 × 108 |
| Interquartile range (IQR) | 51261773 |
Descriptive statistics
| Standard deviation | 71688773 |
|---|---|
| Coefficient of variation (CV) | 1.6550356 |
| Kurtosis | 24.247244 |
| Mean | 43315548 |
| Median Absolute Deviation (MAD) | 17370700 |
| Skewness | 3.9285551 |
| Sum | 4.2440574 × 1011 |
| Variance | 5.1392802 × 1015 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 0 | 752 | 7.7% |
| 8000000 | 12 | 0.1% |
| 7000000 | 11 | 0.1% |
| 10000000 | 11 | 0.1% |
| 2000000 | 9 | 0.1% |
| 4360000 | 8 | 0.1% |
| 4000000 | 7 | 0.1% |
| 11000000 | 6 | 0.1% |
| 25000000 | 6 | 0.1% |
| 36000000 | 5 | 0.1% |
| Other values (7734) | 8971 |
| Value | Count | Frequency (%) |
| 0 | 752 | |
| 30 | 1 | < 0.1% |
| 264 | 1 | < 0.1% |
| 310 | 1 | < 0.1% |
| 388 | 2 | < 0.1% |
| 401 | 1 | < 0.1% |
| 423 | 1 | < 0.1% |
| 527 | 1 | < 0.1% |
| 528 | 1 | < 0.1% |
| 673 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 936662225 | 2 | |
| 858373000 | 1 | |
| 814811535 | 1 | |
| 785221649 | 1 | |
| 749766139 | 1 | |
| 718732821 | 1 | |
| 700059566 | 1 | |
| 684075767 | 1 | |
| 678815482 | 1 | |
| 674460013 | 1 |
| Distinct | 7810 |
|---|---|
| Distinct (%) | 79.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 76.7 KiB |
Length
| Max length | 85 |
|---|---|
| Median length | 45 |
| Mean length | 13.451929 |
| Min length | 1 |
Unique
| Unique | 5934 ? |
|---|---|
| Unique (%) | 60.6% |
Sample
| 1st row | #horror |
|---|---|
| 2nd row | (500)daysofsummer |
| 3rd row | 10,000b.c. |
| 4th row | 10,000bc |
| 5th row | 101dalmatians |
| Value | Count | Frequency (%) |
| kingkong | 5 | 0.1% |
| nightofthelivingdead | 4 | < 0.1% |
| shaft | 4 | < 0.1% |
| conanthebarbarian | 4 | < 0.1% |
| thealamo | 4 | < 0.1% |
| houseofwax | 4 | < 0.1% |
| cinderella | 4 | < 0.1% |
| thesignal | 4 | < 0.1% |
| robinhood | 4 | < 0.1% |
| ghostbusters | 4 | < 0.1% |
| Other values (7779) | 9757 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 15739 | |
| a | 10733 | 8.1% |
| t | 10513 | 8.0% |
| o | 9262 | 7.0% |
| r | 8891 | 6.7% |
| n | 8682 | 6.6% |
| i | 8499 | 6.4% |
| s | 7771 | 5.9% |
| h | 6611 | 5.0% |
| l | 6051 | 4.6% |
| Other values (449) | 39050 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 131802 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 15739 | |
| a | 10733 | 8.1% |
| t | 10513 | 8.0% |
| o | 9262 | 7.0% |
| r | 8891 | 6.7% |
| n | 8682 | 6.6% |
| i | 8499 | 6.4% |
| s | 7771 | 5.9% |
| h | 6611 | 5.0% |
| l | 6051 | 4.6% |
| Other values (449) | 39050 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 131802 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 15739 | |
| a | 10733 | 8.1% |
| t | 10513 | 8.0% |
| o | 9262 | 7.0% |
| r | 8891 | 6.7% |
| n | 8682 | 6.6% |
| i | 8499 | 6.4% |
| s | 7771 | 5.9% |
| h | 6611 | 5.0% |
| l | 6051 | 4.6% |
| Other values (449) | 39050 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 131802 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 15739 | |
| a | 10733 | 8.1% |
| t | 10513 | 8.0% |
| o | 9262 | 7.0% |
| r | 8891 | 6.7% |
| n | 8682 | 6.6% |
| i | 8499 | 6.4% |
| s | 7771 | 5.9% |
| h | 6611 | 5.0% |
| l | 6051 | 4.6% |
| Other values (449) | 39050 |
imdb_id
Text
Missing 
| Distinct | 7130 |
|---|---|
| Distinct (%) | 79.6% |
| Missing | 845 |
| Missing (%) | 8.6% |
| Memory size | 76.7 KiB |
Length
| Max length | 10 |
|---|---|
| Median length | 9 |
| Mean length | 9.0247962 |
| Min length | 9 |
Unique
| Unique | 5310 ? |
|---|---|
| Unique (%) | 59.3% |
Sample
| 1st row | tt3526286 |
|---|---|
| 2nd row | tt1022603 |
| 3rd row | tt0443649 |
| 4th row | tt0443649 |
| 5th row | tt0115433 |
| Value | Count | Frequency (%) |
| tt1325004 | 3 | < 0.1% |
| tt1073498 | 3 | < 0.1% |
| tt3470600 | 3 | < 0.1% |
| tt1318514 | 2 | < 0.1% |
| tt1034331 | 2 | < 0.1% |
| tt1821549 | 2 | < 0.1% |
| tt1559547 | 2 | < 0.1% |
| tt0141926 | 2 | < 0.1% |
| tt0498381 | 2 | < 0.1% |
| tt0388500 | 2 | < 0.1% |
| Other values (7120) | 8930 |
Most occurring characters
| Value | Count | Frequency (%) |
| t | 17906 | |
| 0 | 11679 | |
| 1 | 8299 | |
| 2 | 6443 | 8.0% |
| 4 | 5820 | 7.2% |
| 3 | 5771 | 7.1% |
| 8 | 5323 | 6.6% |
| 6 | 5105 | 6.3% |
| 9 | 4955 | 6.1% |
| 7 | 4904 | 6.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 80799 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| t | 17906 | |
| 0 | 11679 | |
| 1 | 8299 | |
| 2 | 6443 | 8.0% |
| 4 | 5820 | 7.2% |
| 3 | 5771 | 7.1% |
| 8 | 5323 | 6.6% |
| 6 | 5105 | 6.3% |
| 9 | 4955 | 6.1% |
| 7 | 4904 | 6.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 80799 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| t | 17906 | |
| 0 | 11679 | |
| 1 | 8299 | |
| 2 | 6443 | 8.0% |
| 4 | 5820 | 7.2% |
| 3 | 5771 | 7.1% |
| 8 | 5323 | 6.6% |
| 6 | 5105 | 6.3% |
| 9 | 4955 | 6.1% |
| 7 | 4904 | 6.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 80799 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| t | 17906 | |
| 0 | 11679 | |
| 1 | 8299 | |
| 2 | 6443 | 8.0% |
| 4 | 5820 | 7.2% |
| 3 | 5771 | 7.1% |
| 8 | 5323 | 6.6% |
| 6 | 5105 | 6.3% |
| 9 | 4955 | 6.1% |
| 7 | 4904 | 6.1% |
Missing 
| Distinct | 5976 |
|---|---|
| Distinct (%) | 71.2% |
| Missing | 1408 |
| Missing (%) | 14.4% |
| Memory size | 76.7 KiB |
Length
| Max length | 473 |
|---|---|
| Median length | 220 |
| Mean length | 67.089035 |
| Min length | 1 |
Unique
| Unique | 4319 ? |
|---|---|
| Unique (%) | 51.5% |
Sample
| 1st row | Centropolis Entertainment, Legendary Pictures, The Department of Trade, Industry and Competition of South Africa, Moonlighting Films, Warner Bros. Pictures |
|---|---|
| 2nd row | Centropolis Entertainment, Legendary Pictures, The Department of Trade, Industry and Competition of South Africa, Moonlighting Films, Warner Bros. Pictures |
| 3rd row | Walt Disney Pictures, Cruella Productions, Kanzaman S.A.M. |
| 4th row | Bad Robot |
| 5th row | Bad Robot |
| Value | Count | Frequency (%) |
| pictures | 5884 | 8.5% |
| productions | 4071 | 5.9% |
| entertainment | 3461 | 5.0% |
| films | 3449 | 5.0% |
| film | 1353 | 2.0% |
| media | 859 | 1.2% |
| fox | 736 | 1.1% |
| the | 699 | 1.0% |
| warner | 691 | 1.0% |
| company | 680 | 1.0% |
| Other values (7294) | 47469 |
Most occurring characters
| Value | Count | Frequency (%) |
| 60965 | 10.8% | |
| e | 41595 | 7.4% |
| i | 41003 | 7.3% |
| n | 38111 | 6.8% |
| t | 37753 | 6.7% |
| r | 34837 | 6.2% |
| o | 29851 | 5.3% |
| a | 29580 | 5.3% |
| s | 26026 | 4.6% |
| , | 21063 | 3.7% |
| Other values (118) | 202093 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 562877 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 60965 | 10.8% | |
| e | 41595 | 7.4% |
| i | 41003 | 7.3% |
| n | 38111 | 6.8% |
| t | 37753 | 6.7% |
| r | 34837 | 6.2% |
| o | 29851 | 5.3% |
| a | 29580 | 5.3% |
| s | 26026 | 4.6% |
| , | 21063 | 3.7% |
| Other values (118) | 202093 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 562877 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 60965 | 10.8% | |
| e | 41595 | 7.4% |
| i | 41003 | 7.3% |
| n | 38111 | 6.8% |
| t | 37753 | 6.7% |
| r | 34837 | 6.2% |
| o | 29851 | 5.3% |
| a | 29580 | 5.3% |
| s | 26026 | 4.6% |
| , | 21063 | 3.7% |
| Other values (118) | 202093 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 562877 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 60965 | 10.8% | |
| e | 41595 | 7.4% |
| i | 41003 | 7.3% |
| n | 38111 | 6.8% |
| t | 37753 | 6.7% |
| r | 34837 | 6.2% |
| o | 29851 | 5.3% |
| a | 29580 | 5.3% |
| s | 26026 | 4.6% |
| , | 21063 | 3.7% |
| Other values (118) | 202093 |
release_date
Date
Missing 
| Distinct | 5106 |
|---|---|
| Distinct (%) | 52.9% |
| Missing | 149 |
| Missing (%) | 1.5% |
| Memory size | 76.7 KiB |
| Minimum | 1915-02-08 00:00:00 |
|---|---|
| Maximum | 2068-12-11 00:00:00 |
| Invalid dates | 0 |
| Invalid dates (%) | 0.0% |
final_year
Real number (ℝ)
Missing 
| Distinct | 144 |
|---|---|
| Distinct (%) | 1.5% |
| Missing | 149 |
| Missing (%) | 1.5% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2005.1106 |
| Minimum | 1915 |
|---|---|
| Maximum | 2068 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 76.7 KiB |
Quantile statistics
| Minimum | 1915 |
|---|---|
| 5-th percentile | 1980 |
| Q1 | 1999 |
| median | 2007 |
| Q3 | 2014 |
| 95-th percentile | 2021 |
| Maximum | 2068 |
| Range | 153 |
| Interquartile range (IQR) | 15 |
Descriptive statistics
| Standard deviation | 14.583461 |
|---|---|
| Coefficient of variation (CV) | 0.0072731456 |
| Kurtosis | 5.7825191 |
| Mean | 2005.1106 |
| Median Absolute Deviation (MAD) | 7 |
| Skewness | -0.81713557 |
| Sum | 19347312 |
| Variance | 212.67734 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 2015 | 451 | 4.6% |
| 2011 | 405 | 4.1% |
| 2016 | 401 | 4.1% |
| 2010 | 400 | 4.1% |
| 2014 | 395 | 4.0% |
| 2006 | 386 | 3.9% |
| 2008 | 383 | 3.9% |
| 2013 | 379 | 3.9% |
| 2012 | 364 | 3.7% |
| 2009 | 359 | 3.7% |
| Other values (134) | 5726 |
| Value | Count | Frequency (%) |
| 1915 | 2 | < 0.1% |
| 1916 | 1 | < 0.1% |
| 1921 | 1 | < 0.1% |
| 1922 | 1 | < 0.1% |
| 1924 | 1 | < 0.1% |
| 1925 | 5 | |
| 1927 | 2 | < 0.1% |
| 1928 | 3 | |
| 1929 | 1 | < 0.1% |
| 1930 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 2068 | 8 | |
| 2067 | 9 | |
| 2066 | 6 | |
| 2065 | 7 | |
| 2064 | 8 | |
| 2063 | 8 | |
| 2062 | 8 | |
| 2061 | 5 | |
| 2060 | 5 | |
| 2059 | 4 |
director
Text
Missing 
| Distinct | 3362 |
|---|---|
| Distinct (%) | 37.5% |
| Missing | 827 |
| Missing (%) | 8.4% |
| Memory size | 76.7 KiB |
Length
| Max length | 330 |
|---|---|
| Median length | 152 |
| Mean length | 14.209899 |
| Min length | 3 |
Unique
| Unique | 1889 ? |
|---|---|
| Unique (%) | 21.1% |
Sample
| 1st row | Tara Subkoff |
|---|---|
| 2nd row | Marc Webb |
| 3rd row | Roland Emmerich |
| 4th row | Roland Emmerich |
| 5th row | Stephen Herek |
| Value | Count | Frequency (%) |
| john | 347 | 1.7% |
| david | 297 | 1.5% |
| michael | 242 | 1.2% |
| peter | 173 | 0.9% |
| robert | 166 | 0.8% |
| james | 165 | 0.8% |
| paul | 142 | 0.7% |
| scott | 120 | 0.6% |
| richard | 119 | 0.6% |
| lee | 118 | 0.6% |
| Other values (4219) | 18187 |
Most occurring characters
| Value | Count | Frequency (%) |
| e | 11805 | 9.3% |
| 11106 | 8.7% | |
| a | 10244 | 8.0% |
| n | 8975 | 7.0% |
| r | 8571 | 6.7% |
| o | 7389 | 5.8% |
| i | 7294 | 5.7% |
| l | 5889 | 4.6% |
| t | 4502 | 3.5% |
| s | 4155 | 3.3% |
| Other values (96) | 47547 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 127477 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| e | 11805 | 9.3% |
| 11106 | 8.7% | |
| a | 10244 | 8.0% |
| n | 8975 | 7.0% |
| r | 8571 | 6.7% |
| o | 7389 | 5.8% |
| i | 7294 | 5.7% |
| l | 5889 | 4.6% |
| t | 4502 | 3.5% |
| s | 4155 | 3.3% |
| Other values (96) | 47547 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 127477 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| e | 11805 | 9.3% |
| 11106 | 8.7% | |
| a | 10244 | 8.0% |
| n | 8975 | 7.0% |
| r | 8571 | 6.7% |
| o | 7389 | 5.8% |
| i | 7294 | 5.7% |
| l | 5889 | 4.6% |
| t | 4502 | 3.5% |
| s | 4155 | 3.3% |
| Other values (96) | 47547 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 127477 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| e | 11805 | 9.3% |
| 11106 | 8.7% | |
| a | 10244 | 8.0% |
| n | 8975 | 7.0% |
| r | 8571 | 6.7% |
| o | 7389 | 5.8% |
| i | 7294 | 5.7% |
| l | 5889 | 4.6% |
| t | 4502 | 3.5% |
| s | 4155 | 3.3% |
| Other values (96) | 47547 |
star
Text
Missing 
| Distinct | 7240 |
|---|---|
| Distinct (%) | 80.9% |
| Missing | 849 |
| Missing (%) | 8.7% |
| Memory size | 76.7 KiB |
Length
| Max length | 89 |
|---|---|
| Median length | 79 |
| Mean length | 57.603196 |
| Min length | 9 |
Unique
| Unique | 5592 ? |
|---|---|
| Unique (%) | 62.5% |
Sample
| 1st row | Sadie Seelert, Haley Murphy, Bridget McGarry |
|---|---|
| 2nd row | Zooey Deschanel, Joseph Gordon-Levitt, Geoffrey Arend |
| 3rd row | Steven Strait, Camilla Belle, Cliff Curtis, Nathanael Baring |
| 4th row | Steven Strait, Camilla Belle, Cliff Curtis, Nathanael Baring |
| 5th row | Glenn Close, Jeff Daniels, Joely Richardson |
| Value | Count | Frequency (%) |
| john | 648 | 0.9% |
| michael | 590 | 0.8% |
| james | 460 | 0.6% |
| robert | 399 | 0.5% |
| david | 387 | 0.5% |
| tom | 343 | 0.5% |
| jason | 288 | 0.4% |
| chris | 270 | 0.4% |
| kevin | 265 | 0.4% |
| jennifer | 248 | 0.3% |
| Other values (10885) | 68883 |
Most occurring characters
| Value | Count | Frequency (%) |
| 63835 | 12.4% | |
| e | 43742 | 8.5% |
| a | 41963 | 8.1% |
| n | 33949 | 6.6% |
| r | 28763 | 5.6% |
| i | 28537 | 5.5% |
| , | 26322 | 5.1% |
| o | 25626 | 5.0% |
| l | 23476 | 4.6% |
| t | 16459 | 3.2% |
| Other values (120) | 182819 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 515491 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 63835 | 12.4% | |
| e | 43742 | 8.5% |
| a | 41963 | 8.1% |
| n | 33949 | 6.6% |
| r | 28763 | 5.6% |
| i | 28537 | 5.5% |
| , | 26322 | 5.1% |
| o | 25626 | 5.0% |
| l | 23476 | 4.6% |
| t | 16459 | 3.2% |
| Other values (120) | 182819 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 515491 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 63835 | 12.4% | |
| e | 43742 | 8.5% |
| a | 41963 | 8.1% |
| n | 33949 | 6.6% |
| r | 28763 | 5.6% |
| i | 28537 | 5.5% |
| , | 26322 | 5.1% |
| o | 25626 | 5.0% |
| l | 23476 | 4.6% |
| t | 16459 | 3.2% |
| Other values (120) | 182819 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 515491 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 63835 | 12.4% | |
| e | 43742 | 8.5% |
| a | 41963 | 8.1% |
| n | 33949 | 6.6% |
| r | 28763 | 5.6% |
| i | 28537 | 5.5% |
| , | 26322 | 5.1% |
| o | 25626 | 5.0% |
| l | 23476 | 4.6% |
| t | 16459 | 3.2% |
| Other values (120) | 182819 |
_merge
Categorical
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 76.7 KiB |
| right_only | |
|---|---|
| left_only | |
| both |
Length
| Max length | 10 |
|---|---|
| Median length | 9 |
| Mean length | 8.435395 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | both |
|---|---|
| 2nd row | both |
| 3rd row | right_only |
| 4th row | left_only |
| 5th row | left_only |
Common Values
| Value | Count | Frequency (%) |
| right_only | 4578 | |
| left_only | 3198 | |
| both | 2022 |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| right_only | 4578 | |
| left_only | 3198 | |
| both | 2022 |
Most occurring characters
| Value | Count | Frequency (%) |
| l | 10974 | |
| t | 9798 | |
| o | 9798 | |
| _ | 7776 | |
| n | 7776 | |
| y | 7776 | |
| h | 6600 | |
| r | 4578 | |
| i | 4578 | |
| g | 4578 | |
| Other values (3) | 8418 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 82650 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| l | 10974 | |
| t | 9798 | |
| o | 9798 | |
| _ | 7776 | |
| n | 7776 | |
| y | 7776 | |
| h | 6600 | |
| r | 4578 | |
| i | 4578 | |
| g | 4578 | |
| Other values (3) | 8418 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 82650 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| l | 10974 | |
| t | 9798 | |
| o | 9798 | |
| _ | 7776 | |
| n | 7776 | |
| y | 7776 | |
| h | 6600 | |
| r | 4578 | |
| i | 4578 | |
| g | 4578 | |
| Other values (3) | 8418 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 82650 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| l | 10974 | |
| t | 9798 | |
| o | 9798 | |
| _ | 7776 | |
| n | 7776 | |
| y | 7776 | |
| h | 6600 | |
| r | 4578 | |
| i | 4578 | |
| g | 4578 | |
| Other values (3) | 8418 |
certificate
Categorical
Imbalance  Missing 
| Distinct | 13 |
|---|---|
| Distinct (%) | 0.2% |
| Missing | 1587 |
| Missing (%) | 16.2% |
| Memory size | 76.7 KiB |
| R | |
|---|---|
| PG-13 | |
| PG | |
| G | 249 |
| NR | 249 |
| Other values (8) | 59 |
Length
| Max length | 9 |
|---|---|
| Median length | 8 |
| Mean length | 2.549385 |
| Min length | 1 |
Unique
| Unique | 3 ? |
|---|---|
| Unique (%) | < 0.1% |
Sample
| 1st row | Not Rated |
|---|---|
| 2nd row | PG-13 |
| 3rd row | PG-13 |
| 4th row | PG-13 |
| 5th row | G |
Common Values
| Value | Count | Frequency (%) |
| R | 3607 | |
| PG-13 | 2696 | |
| PG | 1351 | 13.8% |
| G | 249 | 2.5% |
| NR | 249 | 2.5% |
| NC-17 | 27 | 0.3% |
| Not Rated | 20 | 0.2% |
| Approved | 5 | 0.1% |
| Unrated | 2 | < 0.1% |
| PG-13 | 2 | < 0.1% |
| Other values (3) | 3 | < 0.1% |
| (Missing) | 1587 |
Length
| Value | Count | Frequency (%) |
| r | 3607 | |
| pg-13 | 2698 | |
| pg | 1351 | 16.4% |
| g | 249 | 3.0% |
| nr | 249 | 3.0% |
| nc-17 | 27 | 0.3% |
| not | 20 | 0.2% |
| rated | 20 | 0.2% |
| approved | 5 | 0.1% |
| unrated | 2 | < 0.1% |
| Other values (3) | 3 | < 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| G | 4298 | |
| P | 4050 | |
| R | 3876 | |
| - | 2727 | |
| 1 | 2726 | |
| 3 | 2698 | |
| N | 296 | 1.4% |
| t | 42 | 0.2% |
| d | 28 | 0.1% |
| e | 28 | 0.1% |
| Other values (16) | 164 | 0.8% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 20933 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| G | 4298 | |
| P | 4050 | |
| R | 3876 | |
| - | 2727 | |
| 1 | 2726 | |
| 3 | 2698 | |
| N | 296 | 1.4% |
| t | 42 | 0.2% |
| d | 28 | 0.1% |
| e | 28 | 0.1% |
| Other values (16) | 164 | 0.8% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 20933 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| G | 4298 | |
| P | 4050 | |
| R | 3876 | |
| - | 2727 | |
| 1 | 2726 | |
| 3 | 2698 | |
| N | 296 | 1.4% |
| t | 42 | 0.2% |
| d | 28 | 0.1% |
| e | 28 | 0.1% |
| Other values (16) | 164 | 0.8% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 20933 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| G | 4298 | |
| P | 4050 | |
| R | 3876 | |
| - | 2727 | |
| 1 | 2726 | |
| 3 | 2698 | |
| N | 296 | 1.4% |
| t | 42 | 0.2% |
| d | 28 | 0.1% |
| e | 28 | 0.1% |
| Other values (16) | 164 | 0.8% |
rating
Real number (ℝ)
Missing 
| Distinct | 1723 |
|---|---|
| Distinct (%) | 19.5% |
| Missing | 983 |
| Missing (%) | 10.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.4313958 |
| Minimum | 0.5 |
|---|---|
| Maximum | 10 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 76.7 KiB |
Quantile statistics
| Minimum | 0.5 |
|---|---|
| 5-th percentile | 4.9 |
| Q1 | 5.9 |
| median | 6.5 |
| Q3 | 7.0015 |
| 95-th percentile | 7.7986 |
| Maximum | 10 |
| Range | 9.5 |
| Interquartile range (IQR) | 1.1015 |
Descriptive statistics
| Standard deviation | 0.89636096 |
|---|---|
| Coefficient of variation (CV) | 0.13937269 |
| Kurtosis | 2.1521949 |
| Mean | 6.4313958 |
| Median Absolute Deviation (MAD) | 0.546 |
| Skewness | -0.63388987 |
| Sum | 56692.754 |
| Variance | 0.80346297 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 6.5 | 290 | 3.0% |
| 6.2 | 266 | 2.7% |
| 6.6 | 262 | 2.7% |
| 6 | 262 | 2.7% |
| 6.3 | 261 | 2.7% |
| 6.4 | 255 | 2.6% |
| 6.7 | 245 | 2.5% |
| 6.1 | 244 | 2.5% |
| 6.9 | 215 | 2.2% |
| 5.8 | 211 | 2.2% |
| Other values (1713) | 6304 | |
| (Missing) | 983 | 10.0% |
| Value | Count | Frequency (%) |
| 0.5 | 2 | |
| 1 | 2 | |
| 1.2 | 1 | < 0.1% |
| 1.5 | 1 | < 0.1% |
| 1.9 | 1 | < 0.1% |
| 2 | 4 | |
| 2.056 | 1 | < 0.1% |
| 2.4 | 2 | |
| 2.5 | 4 | |
| 2.6 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 10 | 10 | |
| 9.8 | 1 | < 0.1% |
| 9 | 1 | < 0.1% |
| 8.708 | 1 | < 0.1% |
| 8.7 | 2 | < 0.1% |
| 8.6 | 2 | < 0.1% |
| 8.566 | 2 | < 0.1% |
| 8.549 | 1 | < 0.1% |
| 8.538 | 2 | < 0.1% |
| 8.519 | 2 | < 0.1% |
runtime
Real number (ℝ)
Missing 
| Distinct | 199 |
|---|---|
| Distinct (%) | 2.2% |
| Missing | 903 |
| Missing (%) | 9.2% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 108.1765 |
| Minimum | 2 |
|---|---|
| Maximum | 339 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 76.7 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 83 |
| Q1 | 95 |
| median | 105 |
| Q3 | 119 |
| 95-th percentile | 145 |
| Maximum | 339 |
| Range | 337 |
| Interquartile range (IQR) | 24 |
Descriptive statistics
| Standard deviation | 22.155153 |
|---|---|
| Coefficient of variation (CV) | 0.20480559 |
| Kurtosis | 5.9755883 |
| Mean | 108.1765 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 0.73393337 |
| Sum | 962230 |
| Variance | 490.85079 |
| Monotonicity | Not monotonic |
| Value | Count | Frequency (%) |
| 98 | 246 | 2.5% |
| 90 | 245 | 2.5% |
| 100 | 241 | 2.5% |
| 95 | 234 | 2.4% |
| 97 | 229 | 2.3% |
| 104 | 215 | 2.2% |
| 107 | 213 | 2.2% |
| 101 | 213 | 2.2% |
| 105 | 211 | 2.2% |
| 99 | 203 | 2.1% |
| Other values (189) | 6645 | |
| (Missing) | 903 | 9.2% |
| Value | Count | Frequency (%) |
| 2 | 2 | < 0.1% |
| 3 | 3 | |
| 4 | 1 | < 0.1% |
| 5 | 5 | |
| 6 | 2 | < 0.1% |
| 7 | 4 | |
| 8 | 4 | |
| 9 | 1 | < 0.1% |
| 10 | 3 | |
| 11 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 339 | 1 | |
| 266 | 1 | |
| 254 | 1 | |
| 251 | 1 | |
| 242 | 1 | |
| 240 | 1 | |
| 233 | 1 | |
| 229 | 1 | |
| 228 | 1 | |
| 225 | 1 |
genres
Text
Missing 
| Distinct | 1703 |
|---|---|
| Distinct (%) | 19.2% |
| Missing | 907 |
| Missing (%) | 9.3% |
| Memory size | 76.7 KiB |
Length
| Max length | 73 |
|---|---|
| Median length | 61 |
| Mean length | 21.511978 |
| Min length | 5 |
Unique
| Unique | 812 ? |
|---|---|
| Unique (%) | 9.1% |
Sample
| 1st row | Crime, Drama, Horror |
|---|---|
| 2nd row | Comedy, Drama, Romance |
| 3rd row | Adventure, Action, Drama, Fantasy |
| 4th row | Adventure, Action, Drama, Fantasy |
| 5th row | Adventure, Comedy, Crime |
| Value | Count | Frequency (%) |
| drama | 4129 | |
| comedy | 3148 | |
| action | 2338 | |
| thriller | 2331 | |
| adventure | 1778 | 7.2% |
| romance | 1665 | 6.7% |
| crime | 1449 | 5.8% |
| horror | 1056 | 4.3% |
| science | 1044 | 4.2% |
| fiction | 1044 | 4.2% |
| Other values (16) | 4820 |
Most occurring characters
| Value | Count | Frequency (%) |
| r | 17127 | 9.0% |
| 15911 | 8.3% | |
| e | 15473 | 8.1% |
| , | 14839 | 7.8% |
| a | 13902 | 7.3% |
| i | 12259 | 6.4% |
| m | 12101 | 6.3% |
| o | 11598 | 6.1% |
| n | 10230 | 5.3% |
| t | 8238 | 4.3% |
| Other values (23) | 59585 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 191263 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| r | 17127 | 9.0% |
| 15911 | 8.3% | |
| e | 15473 | 8.1% |
| , | 14839 | 7.8% |
| a | 13902 | 7.3% |
| i | 12259 | 6.4% |
| m | 12101 | 6.3% |
| o | 11598 | 6.1% |
| n | 10230 | 5.3% |
| t | 8238 | 4.3% |
| Other values (23) | 59585 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 191263 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| r | 17127 | 9.0% |
| 15911 | 8.3% | |
| e | 15473 | 8.1% |
| , | 14839 | 7.8% |
| a | 13902 | 7.3% |
| i | 12259 | 6.4% |
| m | 12101 | 6.3% |
| o | 11598 | 6.1% |
| n | 10230 | 5.3% |
| t | 8238 | 4.3% |
| Other values (23) | 59585 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 191263 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| r | 17127 | 9.0% |
| 15911 | 8.3% | |
| e | 15473 | 8.1% |
| , | 14839 | 7.8% |
| a | 13902 | 7.3% |
| i | 12259 | 6.4% |
| m | 12101 | 6.3% |
| o | 11598 | 6.1% |
| n | 10230 | 5.3% |
| t | 8238 | 4.3% |
| Other values (23) | 59585 |
Missing 
| Distinct | 775 |
|---|---|
| Distinct (%) | 8.8% |
| Missing | 1001 |
| Missing (%) | 10.2% |
| Memory size | 76.7 KiB |
Length
| Max length | 111 |
|---|---|
| Median length | 24 |
| Mean length | 26.902694 |
| Min length | 4 |
Unique
| Unique | 458 ? |
|---|---|
| Unique (%) | 5.2% |
Sample
| 1st row | United States |
|---|---|
| 2nd row | United States |
| 3rd row | United States of America, South Africa, New Zealand |
| 4th row | United States of America, South Africa, New Zealand |
| 5th row | United States, United Kingdom |
| Value | Count | Frequency (%) |
| united | 9086 | |
| states | 7670 | |
| of | 7314 | |
| america | 7314 | |
| kingdom | 1391 | 3.7% |
| france | 688 | 1.8% |
| germany | 565 | 1.5% |
| canada | 507 | 1.4% |
| japan | 190 | 0.5% |
| australia | 189 | 0.5% |
| Other values (108) | 2332 | 6.3% |
Most occurring characters
| Value | Count | Frequency (%) |
| 28449 | ||
| e | 26361 | |
| t | 25046 | 10.6% |
| a | 20214 | 8.5% |
| i | 19188 | 8.1% |
| n | 13814 | 5.8% |
| d | 11548 | 4.9% |
| m | 9527 | 4.0% |
| r | 9380 | 4.0% |
| o | 9372 | 4.0% |
| Other values (40) | 63764 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 236663 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| 28449 | ||
| e | 26361 | |
| t | 25046 | 10.6% |
| a | 20214 | 8.5% |
| i | 19188 | 8.1% |
| n | 13814 | 5.8% |
| d | 11548 | 4.9% |
| m | 9527 | 4.0% |
| r | 9380 | 4.0% |
| o | 9372 | 4.0% |
| Other values (40) | 63764 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 236663 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| 28449 | ||
| e | 26361 | |
| t | 25046 | 10.6% |
| a | 20214 | 8.5% |
| i | 19188 | 8.1% |
| n | 13814 | 5.8% |
| d | 11548 | 4.9% |
| m | 9527 | 4.0% |
| r | 9380 | 4.0% |
| o | 9372 | 4.0% |
| Other values (40) | 63764 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 236663 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| 28449 | ||
| e | 26361 | |
| t | 25046 | 10.6% |
| a | 20214 | 8.5% |
| i | 19188 | 8.1% |
| n | 13814 | 5.8% |
| d | 11548 | 4.9% |
| m | 9527 | 4.0% |
| r | 9380 | 4.0% |
| o | 9372 | 4.0% |
| Other values (40) | 63764 |
Missing 
| Distinct | 165 |
|---|---|
| Distinct (%) | 1.9% |
| Missing | 965 |
| Missing (%) | 9.8% |
| Memory size | 76.7 KiB |
Length
| Max length | 87 |
|---|---|
| Median length | 7 |
| Mean length | 7.2580097 |
| Min length | 3 |
Unique
| Unique | 92 ? |
|---|---|
| Unique (%) | 1.0% |
Sample
| 1st row | English |
|---|---|
| 2nd row | English, French, Swedish |
| 3rd row | English |
| 4th row | English |
| 5th row | English, Spanish |
| Value | Count | Frequency (%) |
| english | 7412 | |
| french | 282 | 3.1% |
| spanish | 237 | 2.6% |
| german | 177 | 1.9% |
| mandarin | 109 | 1.2% |
| japanese | 105 | 1.1% |
| arabic | 84 | 0.9% |
| hindi | 83 | 0.9% |
| russian | 72 | 0.8% |
| italian | 65 | 0.7% |
| Other values (78) | 521 | 5.7% |
Most occurring characters
| Value | Count | Frequency (%) |
| n | 9067 | |
| i | 8441 | |
| s | 8120 | |
| h | 8069 | |
| l | 7553 | |
| g | 7543 | |
| E | 7416 | |
| a | 1537 | 2.4% |
| e | 1091 | 1.7% |
| r | 831 | 1.3% |
| Other values (42) | 4442 |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 64110 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| n | 9067 | |
| i | 8441 | |
| s | 8120 | |
| h | 8069 | |
| l | 7553 | |
| g | 7543 | |
| E | 7416 | |
| a | 1537 | 2.4% |
| e | 1091 | 1.7% |
| r | 831 | 1.3% |
| Other values (42) | 4442 |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 64110 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| n | 9067 | |
| i | 8441 | |
| s | 8120 | |
| h | 8069 | |
| l | 7553 | |
| g | 7543 | |
| E | 7416 | |
| a | 1537 | 2.4% |
| e | 1091 | 1.7% |
| r | 831 | 1.3% |
| Other values (42) | 4442 |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 64110 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| n | 9067 | |
| i | 8441 | |
| s | 8120 | |
| h | 8069 | |
| l | 7553 | |
| g | 7543 | |
| E | 7416 | |
| a | 1537 | 2.4% |
| e | 1091 | 1.7% |
| r | 831 | 1.3% |
| Other values (42) | 4442 |
enrichment_source
Categorical
Imbalance  Missing 
| Distinct | 3 |
|---|---|
| Distinct (%) | < 0.1% |
| Missing | 866 |
| Missing (%) | 8.8% |
| Memory size | 76.7 KiB |
| tmdb | |
|---|---|
| omdb_id | 383 |
| omdb_title | 9 |
Length
| Max length | 10 |
|---|---|
| Median length | 4 |
| Mean length | 4.1346843 |
| Min length | 4 |
Unique
| Unique | 0 ? |
|---|---|
| Unique (%) | 0.0% |
Sample
| 1st row | omdb_id |
|---|---|
| 2nd row | omdb_id |
| 3rd row | tmdb |
| 4th row | tmdb |
| 5th row | omdb_id |
Common Values
| Value | Count | Frequency (%) |
| tmdb | 8540 | |
| omdb_id | 383 | 3.9% |
| omdb_title | 9 | 0.1% |
| (Missing) | 866 | 8.8% |
Length
Common Values (Plot)
| Value | Count | Frequency (%) |
| tmdb | 8540 | |
| omdb_id | 383 | 4.3% |
| omdb_title | 9 | 0.1% |
Most occurring characters
| Value | Count | Frequency (%) |
| d | 9315 | |
| m | 8932 | |
| b | 8932 | |
| t | 8558 | |
| o | 392 | 1.1% |
| _ | 392 | 1.1% |
| i | 392 | 1.1% |
| l | 9 | < 0.1% |
| e | 9 | < 0.1% |
Most occurring categories
| Value | Count | Frequency (%) |
| (unknown) | 36931 |
Most frequent character per category
(unknown)
| Value | Count | Frequency (%) |
| d | 9315 | |
| m | 8932 | |
| b | 8932 | |
| t | 8558 | |
| o | 392 | 1.1% |
| _ | 392 | 1.1% |
| i | 392 | 1.1% |
| l | 9 | < 0.1% |
| e | 9 | < 0.1% |
Most occurring scripts
| Value | Count | Frequency (%) |
| (unknown) | 36931 |
Most frequent character per script
(unknown)
| Value | Count | Frequency (%) |
| d | 9315 | |
| m | 8932 | |
| b | 8932 | |
| t | 8558 | |
| o | 392 | 1.1% |
| _ | 392 | 1.1% |
| i | 392 | 1.1% |
| l | 9 | < 0.1% |
| e | 9 | < 0.1% |
Most occurring blocks
| Value | Count | Frequency (%) |
| (unknown) | 36931 |
Most frequent character per block
(unknown)
| Value | Count | Frequency (%) |
| d | 9315 | |
| m | 8932 | |
| b | 8932 | |
| t | 8558 | |
| o | 392 | 1.1% |
| _ | 392 | 1.1% |
| i | 392 | 1.1% |
| l | 9 | < 0.1% |
| e | 9 | < 0.1% |
Interactions
Correlations
| _merge | certificate | enrichment_source | final_budget | final_domestic_boxoffice | final_worldwide_boxoffice | final_year | rating | runtime | |
|---|---|---|---|---|---|---|---|---|---|
| _merge | 1.000 | 0.075 | 0.130 | 0.046 | 0.018 | 0.028 | 0.261 | 0.069 | 0.072 |
| certificate | 0.075 | 1.000 | 0.348 | 0.108 | 0.075 | 0.075 | 0.248 | 0.040 | 0.094 |
| enrichment_source | 0.130 | 0.348 | 1.000 | 0.000 | 0.000 | 0.000 | 0.035 | 0.020 | 0.033 |
| final_budget | 0.046 | 0.108 | 0.000 | 1.000 | 0.668 | 0.724 | 0.106 | 0.057 | 0.321 |
| final_domestic_boxoffice | 0.018 | 0.075 | 0.000 | 0.668 | 1.000 | 0.926 | -0.093 | 0.256 | 0.252 |
| final_worldwide_boxoffice | 0.028 | 0.075 | 0.000 | 0.724 | 0.926 | 1.000 | 0.019 | 0.281 | 0.294 |
| final_year | 0.261 | 0.248 | 0.035 | 0.106 | -0.093 | 0.019 | 1.000 | -0.011 | 0.005 |
| rating | 0.069 | 0.040 | 0.020 | 0.057 | 0.256 | 0.281 | -0.011 | 1.000 | 0.394 |
| runtime | 0.072 | 0.094 | 0.033 | 0.321 | 0.252 | 0.294 | 0.005 | 0.394 | 1.000 |
Missing values
Sample
| final_title | final_budget | final_worldwide_boxoffice | final_domestic_boxoffice | final_clean_title | imdb_id | production_companies | release_date | final_year | director | star | _merge | certificate | rating | runtime | genres | production_countries | original_language | enrichment_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | #Horror | 1500000.0 | 0.0 | 0.0 | #horror | tt3526286 | NaN | 2015-11-20 | 2015.0 | Tara Subkoff | Sadie Seelert, Haley Murphy, Bridget McGarry | both | Not Rated | 3.100 | 97.0 | Crime, Drama, Horror | United States | English | omdb_id |
| 1 | (500) Days of Summer | 7500000.0 | 34515303.0 | 32425665.0 | (500)daysofsummer | tt1022603 | NaN | 2009-07-17 | 2009.0 | Marc Webb | Zooey Deschanel, Joseph Gordon-Levitt, Geoffrey Arend | both | PG-13 | 7.700 | 95.0 | Comedy, Drama, Romance | United States | English, French, Swedish | omdb_id |
| 2 | 10,000 B.C. | 105000000.0 | 269065678.0 | 94784201.0 | 10,000b.c. | tt0443649 | Centropolis Entertainment, Legendary Pictures, The Department of Trade, Industry and Competition of South Africa, Moonlighting Films, Warner Bros. Pictures | 2008-03-07 | 2008.0 | Roland Emmerich | Steven Strait, Camilla Belle, Cliff Curtis, Nathanael Baring | right_only | PG-13 | 5.500 | 109.0 | Adventure, Action, Drama, Fantasy | United States of America, South Africa, New Zealand | English | tmdb |
| 3 | 10,000 BC | 105000000.0 | 269784201.0 | 94784201.0 | 10,000bc | tt0443649 | Centropolis Entertainment, Legendary Pictures, The Department of Trade, Industry and Competition of South Africa, Moonlighting Films, Warner Bros. Pictures | 2008-02-22 | 2008.0 | Roland Emmerich | Steven Strait, Camilla Belle, Cliff Curtis, Nathanael Baring | left_only | PG-13 | 5.500 | 109.0 | Adventure, Action, Drama, Fantasy | United States of America, South Africa, New Zealand | English | tmdb |
| 4 | 101 Dalmatians | 54000000.0 | 320689294.0 | 136189294.0 | 101dalmatians | tt0115433 | NaN | 1996-11-17 | 1996.0 | Stephen Herek | Glenn Close, Jeff Daniels, Joely Richardson | left_only | G | 5.700 | 103.0 | Adventure, Comedy, Crime | United States, United Kingdom | English, Spanish | omdb_id |
| 5 | 102 Dalmatians | 85000000.0 | 183611771.0 | 66957026.0 | 102dalmatians | tt0211181 | NaN | 2000-10-07 | 2000.0 | Kevin Lima | Glenn Close, Gérard Depardieu, Ioan Gruffudd | left_only | G | 4.900 | 100.0 | Adventure, Comedy, Family | United States, United Kingdom, Monaco | English | omdb_id |
| 6 | 102 Dalmatians | 85000000.0 | 66941559.0 | 66941559.0 | 102dalmatians | tt0211181 | Walt Disney Pictures, Cruella Productions, Kanzaman S.A.M. | 2000-11-22 | 2000.0 | Kevin Lima | Glenn Close, Gérard Depardieu, Ioan Gruffudd, Alice Evans | right_only | G | 5.500 | 100.0 | Family, Comedy | United States of America | English | tmdb |
| 7 | 10 Cloverfield Lane | 15000000.0 | 108286422.0 | 72082999.0 | 10cloverfieldlane | tt1179933 | Bad Robot | 2016-01-04 | 2016.0 | Dan Trachtenberg | John Goodman, Mary Elizabeth Winstead, John Gallagher Jr., Douglas M. Griffin | right_only | PG-13 | 6.993 | 104.0 | Thriller, Science Fiction, Drama, Horror | United States of America | English | tmdb |
| 8 | 10 Cloverfield Lane | 15000000.0 | 110216998.0 | 72082998.0 | 10cloverfieldlane | tt1179933 | Bad Robot | 2016-03-10 | 2016.0 | Dan Trachtenberg | John Goodman, Mary Elizabeth Winstead, John Gallagher Jr., Douglas M. Griffin | left_only | PG-13 | 6.993 | 104.0 | Thriller, Science Fiction, Drama, Horror | United States of America | English | tmdb |
| 9 | 10 Days in a Madhouse | 12000000.0 | 14616.0 | 14616.0 | 10daysinamadhouse | tt3453052 | NaN | 2015-11-11 | 2015.0 | Timothy Hines | Caroline Barry, Christopher Lambert, Kelly LeBrock, Julia Chantrey | right_only | NaN | 4.500 | 111.0 | Drama | United States of America | English | tmdb |
| final_title | final_budget | final_worldwide_boxoffice | final_domestic_boxoffice | final_clean_title | imdb_id | production_companies | release_date | final_year | director | star | _merge | certificate | rating | runtime | genres | production_countries | original_language | enrichment_source | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 9788 | 마더 | 5000000.0 | 17112713.0 | 547292.0 | 마더 | tt1216496 | Barunson E&A, CJ Entertainment | 2009-05-28 | 2009.0 | Bong Joon Ho | Kim Hye-ja, Won Bin, Jin Goo, Yoon Je-moon | left_only | R | 7.700 | 129.0 | Crime, Drama, Mystery | South Korea | Korean | tmdb |
| 9789 | 명량 | 9500000.0 | 112156811.0 | 2589811.0 | 명량 | tt3541262 | Big Stone Pictures | 2014-07-30 | 2014.0 | Kim Han-min | Choi Min-sik, Ryu Seung-ryong, Cho Jin-woong, Jin Goo | left_only | NaN | 7.000 | 126.0 | War, Action, Drama, History | South Korea | Japanese | tmdb |
| 9790 | 베를린 | 9000000.0 | 48965210.0 | 665210.0 | 베를린 | tt2357377 | CJ Entertainment, Filmmaker R&K, Union Investment Partners | 2013-01-30 | 2013.0 | Ryoo Seung-wan | Ha Jung-woo, Han Suk-kyu, Ryoo Seung-bum, Jun Ji-hyun | left_only | NaN | 6.700 | 120.0 | Action, Thriller, Crime | South Korea | German | tmdb |
| 9791 | 복수는 나의 것 | 4000000.0 | 1954937.0 | 45289.0 | 복수는나의것 | tt0310775 | CJ Entertainment, Studio Box | 2002-03-29 | 2002.0 | Park Chan-wook | Song Kang-ho, Shin Ha-kyun, Bae Doona, Im Ji-eun | left_only | R | 7.464 | 129.0 | Action, Drama, Thriller | South Korea | Korean | tmdb |
| 9792 | 부산행 | 8820000.0 | 2129768.0 | 2129768.0 | 부산행 | tt5700672 | Next Entertainment World, RedPeter Films, Contents Panda, Union Investment Partners, KTB Network | 2016-07-20 | 2016.0 | Yeon Sang-ho | Gong Yoo, Kim Su-an, Jung Yu-mi, Don Lee | left_only | NR | 7.750 | 118.0 | Horror, Thriller, Action, Adventure | South Korea | Korean | tmdb |
| 9793 | 섬 | 50000.0 | 21075.0 | 21075.0 | 섬 | tt0255589 | Myung Films, CJ Entertainment | 2000-04-22 | 2000.0 | Kim Ki-duk | Kim Yu-seok, Suh Jung, Seo Won, Son Min-seok | left_only | NaN | 6.956 | 90.0 | Drama, Thriller | South Korea | Korean | tmdb |
| 9794 | 아가씨 | 8575000.0 | 1983204.0 | 2006788.0 | 아가씨 | tt4016934 | Moho Film, Yong Film, CJ Entertainment | 2016-06-01 | 2016.0 | Park Chan-wook | Kim Min-hee, Kim Tae-ri, Ha Jung-woo, Cho Jin-woong | left_only | R | 8.200 | 145.0 | Thriller, Drama, Romance | South Korea | Japanese | tmdb |
| 9795 | 올드보이 | 3000000.0 | 14980005.0 | 707481.0 | 올드보이 | tt0364569 | Show East, Egg Film, Cineclick Asia | 2003-01-01 | 2003.0 | Park Chan-wook | Choi Min-sik, Yoo Ji-tae, Kang Hye-jung, Kim Byeong-ok | left_only | R | 8.251 | 120.0 | Drama, Thriller, Mystery, Action | South Korea | Korean | tmdb |
| 9796 | 최종병기 활 | 8000000.0 | 49000000.0 | 251200.0 | 최종병기활 | tt2025526 | Lotte Entertainment, Dasepo Club, DCG Plus, Sovik Venture Capital | 2011-08-10 | 2011.0 | Kim Han-min | Park Hae-il, Moon Chae-won, Kim Moo-yul, Ryu Seung-ryong | left_only | NR | 7.200 | 122.0 | Drama, Action, History | South Korea, United States of America | Korean | tmdb |
| 9797 | 피에타 | 103000.0 | 3623330.0 | 21932.0 | 피에타 | tt2299842 | Next Entertainment World, Kim Ki Duk Film, Finecut | 2012-09-05 | 2012.0 | Kim Ki-duk | Cho Min-soo, Lee Jung-jin, Woo Ki-hong, Kang Eun-jin | left_only | NR | 7.100 | 104.0 | Drama | South Korea | Korean | tmdb |